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Abstract 



We compute Neuberger's overlap operator by the Lanczos algorithm applied to 
the Wilson-Dirac operator. Locality of the operator for quenched QCD data and 
its eigenvalue spectrum in an instanton background are studied. 
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1. Although brute force calculations of the quenched lattice QCD with Wilson fermions 
have been able to approach the chiral limit Q, there are increased efforts to make the 
chiral symmetry exact on the lattice @, 0| . Q 

There are different starting points to formulate lattice actions with exact lattice chiral 
symmetry, but all of them seem to obey the Ginsparg- Wilson condition ||: 

75/T 1 + D~ x l5 = a^a-\ (1) 



1 For a recent review on the topic see 
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where a is the lattice spacing, D is the lattice Dirac operator and a -1 is a local operator 
and trivial in the Dirac space. 

A candidate is the overlap operator of Neuberger 0: 

£> = l- 75 sign(#), H = l5 {a-aD w ) (2) 

where a is a shift parameter in the range (0,2), which we have fixed at one and D\y is 
the Wilson-Dirac operator, 

Dw = \Y.b^; + d,)-ad;d,\ (3) 

and dfj_ and <9* are the nearest-neighbor forward and backward difference operators. 

The locality has been shown for smooth background fields and no violation has been 
observed in quenched samples simulated at moderate couplings [|7|. 

2. So far, all the methods devised to compute the overlap operator by usual iterative 
solvers have been based on (rational) polynomial approximations of the inverse square root 
or the sign function || [|. (Mathematical foundations of these methods are reviewed in 



PH|.) But they may exceed the storage limits in some machines. This is not the case with 



Legendre |L1] and Chebyshev polynomials, which on the other hand are not optimal 

HI- 

In the present work we propose a new method, which uses the outcome of the Lanczos 
algorithm on H. The Lanczos iteration is known to approximate the spectrum of the 
underlying matrix in an optimal way and, in particular, it requires a constant memory 

0- 



Let Q n = [qi, . . . , q n ] be the set of orthonormal vectors, such that 

HQ n = Q n T n , qi = Pl b, p l = l/\\b\\ 2 (4) 

where T n is a tridiagonal and symmetric matrix. Here b stands for an arbitrary vector. 

2 After submission of this paper, the rational polynomial approximation method |8| was improved with 
respect to memory limits |l3| by running twice the Conjugate Gradient (CG) iteration, as it is the case 
here (see below) for the Lanczos algorithm. 



By writing down the above decomposition in terms of the vectors i = 1, . . . , n and 
the matrix elements of T n , we arrive at a three term recurrence that allows to compute 
these vectors in increasing order, starting from the vector q\. This is called the Lanczos al- 
gorithm, which constructs a basis for the so called Krylov subspace: span(b, Hb, . . . , H n ~ l b) 
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In the last equation, it has been assumed that after n steps of the Lanczos algorithm, 
the Krylov subspace remains invariant. The task is the computation of x = (H 2 )~ l l 2 b. 
Our method is based on the following observations: Let (H 2 )^ 1 ^ 2 = f(H 2 ) be a matrix- 
valued function, for example Robert's integral formula flTU| : 



f(H 2 ) = - Tdt^ + H 2 )-' (5) 

7T JO 

Then, clearly: 



f(H 2 )Q n = Q n f(T 2 ) (6) 



Since, on the other hand, 



QneS n) /Pi, (7) 



where denotes the unit vector with n elements in the direction 1, we get: 

x = f{H 2 )b = Q n f{T 2 n )et l) / Pl (8) 

There are some remarks to be made here: 

a) By applying the Lanczos iteration on H, the problem of computing (H 2 )^ 1 ^ 2 reduces 
to the problem of computing {T 2 )~ l l 2 which is typically a much smaller problem than the 
original one. It can be solved for example by using the full decomposition of T n in its 
eigenvalues and eigenvectors; in fact this is the method we have employed too, for its 
compactness and the small overhead for moderate n. 

b) In the floating point arithmetic, there is a danger that once the Lanczos polynomial 
(algorithm) has approximated well some part of the spectrum, the iteration reproduces 



vectors which are rich in that direction [15). As a consequence, the orthogonality of the 



Lanczos vectors is spoiled with an immediate impact on the history of the iteration. 
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c) In general, there is no guarantee that the algorithm will converge at smaller n, 



unless n = rank(iJ) in exact arithmetic (T^|. Therefore, for a given n the equations (Q) 
and (H) hold approximately. 

Therefore, in practical implementations one should be satisfied with a stopping cri- 
terium such as: 

error(n) = \( Pl \\Hx n \\ 2 ) 2 - 1| 1/2 (9) 

is made small enough. 

It is worth writing down the error in terms of the Lanczos matrix; straightforward 
algebra gives: 

error(n) = \(3 n z^\, z n = (T^e^ (10) 

where j3 n is the element (n + l,n) of the matrix T„ +1 and 4 n) is the last component of 
the vector z n . 

Since H and (H 2 ) 1 ^ 2 are equally conditioned in 2-norm, we expect, that once the 
system Hx = b is solved, the system (if 2 ) 1//2 a; = b is also solved. In this context, it is 
desirable to compare the error ( |T0| ) with the residual error of the original system, r n . As 
before, in terms of the Lanczos matrix, it is given by: 

||pir„|| 2 = |A#|, y n = T- 1 et ) (11) 
As long as the orthogonality between Lanczos vectors is sufficiently maintained, equations 



( |lO| - |ll"l) should hold to a good accuracy. 

To implement the result (H), we first construct the Lanczos matrix and then compute 
z n . By repeating the iteration, we compute Lanczos vectors and obtain the result. We 
saved the scalar products, though it was not necessary. If we call 1/pi, i = 1, . . . the norm 
of the residual error of the system Hx = b, it is easy to show that 

Pi+iA + A«i + Pi-iPi-i = (12) 
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Therefore, we have the following algorithm for solving the system (H 2 ) l ' 2 x = b: 

Po = 0, pi = I/H&H2, <?o = o, qi = pib 
for i — 1, . . . 

v = Hqi 

on = qjv 

v := v - qicti - qi-iPi-i 
A = 1Mb 
q l+ i = v/(3i 

_ _ Pi<*i+Pi-l/3j-l 
~ ft 

ifr^-\ < tol, n = i, end for 

J \pi+i\ ' ' J 

(13) 

Set (T n ) i:i = at, (T„)i+i,i = (T„)i ii+1 = pi, otherwise {T n ) id = 
^ = (T„ 2 )-V2 = f/ n (A^)- 1/2 ^ T e^ 

g = 0, gi = Pib, x = o 
for i — 1, . . . , n 

Xi = Xi-i + qiZ® /pi 
v = Hqi 

v := v - qiCti - qi-iPi-i 
q i+ i = v/pi 

where by o we denote a vector with zero entries and U n , A n the matrices of the egienvectors 
and eigenvalues of T n . Note that there are only four large vectors necessary to store: 
qi„ x ,qi,v,Xi. 

Obviously, the memory doesn't grow with n. This is not the case for the shifted CG 
iterations (|| ||) needed to compute the (rational) polynomial approximation of (if 2 ) -1 / 2 . 
Since there is a one to one connection between CG and Lanczos, such approximations on 
the original matrix are transfered to the corresponding Lanczos matrix |TJ]]. This should 
be contrasted with the exact computation of (T 2 ) -1 / 2 . 

To test the above analysis, we have performed simulations of SU(3) gauge theory at 



(3 = 6.0 on a 8 3 16 lattice and picked up an equilibrated configuration. 

In Fig.l we show the residual error |(pi||ifx n ||2) 2 — l] 1 ^ 2 computed directly from x n 
and compare with the same quantity given in terms of the Lanczos matrix, i.e. \(3 n z^f > \. 
It fluctuates between two branches: the upper one corresponds to odd n, the number 
of matrix-vector multiplications, and the lower one to even values of n. For large n, the 
computed and estimated errors deviate from each other, which may indicate accumulation 
of roundoff errors in the computed residual error. Note that we have employed 64 bit 
precision. 

For comparison, we show in Fig.2 the residual error ||pir„||2 of the system Hx = b as 
computed directly and from \f3nVn \- Again, we have two branches as explained above, but 
here there is no distinction between the computed and estimated errors. The appearance 
of two branches is not surprising since we are dealing with a non-definite matrix H. 

In Fig.3 we compare the computed residual errors of the systems Hx = b and (H 2 ) l l 2 x = 
b (upper branches). They are the same most of the time, unless n becomes large and de- 
viations become clearer. This behavior shows that both systems are solved at the same 
time, which should serve us as a guide, because the computation of Xi at each step i is 
very demanding. 

We have compared the efficiency of the above method with that of the rational poly- 
nomial approximation ||. First, we solved the system Hx = b by both Lanczos and CG 
with an 10~ 5 accuracy for the residual error. We needed the same number of multiplica- 
tions with H 2 , 226. To compute the inverse square root with the same accuracy (1CT 5 ), 
we applied then the above method and the rational polynomial method with N = 30, 
the number of the terms in the approximating sum (|§). (Smaller iV will give a lower 
accuracy: in Fig. 4 we display the norm of the residual error (|S]) as a function of N.) 
Therefore, if we stored the Lanczos vectors as in the rational polynomial method, both 
methods need about the same amount of work. To avoid memory restrictions, we increase 

the work by a factor of two. f] 

3 As it was stated above, after submission of this paper, the rational polynomial approximation method 
H was changed in the same fashion, by increasing the work by a facor of two Jl3[ . 
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3. An immediate application of the above method is to check the locality of Neu- 
berger's overlap operator D. 

We used 100 equilibrated SU(3) configurations at (3 — 5.7,6.0 on a 8 3 16 lattice. For 
each configuration we computed the absolute value of D elements in the first column with 
space, spin and color indices fixed at one, a selection that suffices to look for violations of 
the locality. 

The average over 100 configurations is plotted in Fig. 5. At both values of (3, there 
was no single configuration to show an exceptional behavior: the maximum deviation is 
slightly above the mean value. The values of D decrease rapidly with the time slices. For 
large t/a but away from the center, Q they fall off exponentially. 

Recently, it has been shown that for (3 = 6.0, 6.2, 6.4 the occurrence of configurations 
with exceptionally small eigenvalues of H 2 becomes vanishingly small for 12 4 and 16 4 
large lattices, whereas for sufficiently smooth gauge fileds the locality is guaranteed 0. 

In fact at (3 — 5.7 the complete spectrum of 1 — Dyy is in the left half of the complex 
plane. [] We computed partial spectra of by the implicitely restarted Arnoldi iteration 
[3~7f| and found that indeed at this coupling there was no eigenvalue in the right half of 



the complex plane for all our 100 configurations. Therefore, there exist a (3 G (5.7, 6.0) at 
which the operator 1 — Dy/ becomes singular. In this case, there is an unbounded number 
of near zero modes of H and therefore D is no longer local. 

4. As another application we consider the lattice index theorem for an SU(2) instanton 
background on a small 4 4 lattice. An instanton on the lattice can be prepared in various 



ways. We follow [18[ and prepare an instanton with size p in the center of the lattice in 



the singular gauge. The index of D is given by: 

index(D) = iTr[sign(F)] (14) 

Because of the O(a) lattice errors, we expect the instanton being observed for p > a. 

4 Throughout the paper we have used periodic boundary conditions in all directions. 

5 I am grateful to Urs M. Heller for pointing out that normal spectroscopy with Wilson fermions can 

be done at j3 = 5.7 and k = 0.1675 Jl6|], the latter being greater than our corresponding n = 1/6. 
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We computed the smallest eigenvalues of D by an (implicitly restarted) Arnoldi iteration 
with the non-converged Ritz values used as explicit shifts []I7| . We fixed the number of 
Arnoldi steps at 32 and have stopped the iteration when the next starting vector norm 
is smaller than 10~ 6 . We have checked the stability of the computed eigenvalues by 
increasing the cutoff beyond the number of the converged eigenvalues. The stability is 
observed unless the cutoff becomes too large, which means that a larger Arnoldi matrix 
should be employed. 

We note that it is crucial for the eigenvalue computation to have a proper accuracy 
in the computation of D, which in our case has been set at a residual error norm (^) less 
than 1(T 10 . 

We have computed eigenvalues for p/a G [0.5, 1.5] with steps of 0.1. For brevity, we 
show in Table 1 eigenvalues of a smaller set of p. For p < a there are two zero modes, 
whereas for p > a there is a single zero mode present. As numerical accuracy is an issue 
here, we have perturbed the instanton background by applying a small fluctuating gauge 
field. The picture doesn't change, but the zero modes for p < a become nearly zero modes 
with opposite chiralities. 

We have also computed the eigenvalues exactly by standard QR algorithms. The 
instanton mode appears single in both approaches. While the values of the other smallest 
eigenvalues are reproduced exactly by the implicitly restarted Arnoldi algorithm, their 
multiplicity cannot be handled. Since D is normal, the Arnoldi matrix is normal and 
tridiagonal and therefore irreducible, giving no information on the multiplicity. The latter 
is essential when D has more exact zero modes and one must rely on block variants of 
the same algorithm. 

Even on such a small lattice, the computation of the zero modes is not the fastest 
method to compute the topological charge. Tracing the crossings of the smallest eigen- 
values of H is more practical. 

Nonetheless we note that an estimation of the topological charge can be made during 
the computation of D as described in this work. Having computed the Lanczos matrix 
of H one can estimate sign(if), which for our SU(2) instanton gives excellent agreement 
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for most of starting vectors b. In general, a separate study is needed to conclude on this 
method. 

5. To conclude: we have computed with a new method the overlap operator based 
on the Lanczos algorithm applied on the Wilson-Dirac operator. Compared to the other 
methods |], [|, its main advantage is of being free from memory restrictions. Additionally, 
there is no approximation made in the computation of the inverse square root of the 
Lanczos matrix. 

The locality of the overlap operator has been tested. We recommend to check it always 
before any other computation. 

The computation of D turns to be more difficult than Dyy. However, the so-called 
classically perfect actions [|J may help substantially to work on moderate lattices. Further 
studies are needed for the dynamical implementation of D. 

We are grateful to stimulating discussions with Ferenc Niedermayer on the topics 
covered by this work and to Roland Rosenfelder and Philippe de Forcrand for making 
critical remarks on this paper. 

We are grateful to Herbert Neuberger for comments on the method presented in this 
work following the posting of its first version. 

We thank PSI where this work was done and SCSC Manno for the allocation of 
computer time on the NEC SX4. 
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Figure 1: Residual error of the system (H 2 ) l l 2 x = b as defined in (|9|): plus symbols that 
jump between two branches. The same quantity defined in terms of the Lanczos matrix 
is displayed by the solid line. 
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Figure 2: Residual error of the system Hx = b: plus symbols that jump between two 
branches. The same quantity defined in terms of the Lanczos matrix is displayed by the 
solid line. 
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Figure 3: Residual error of the systems Hx = b (plus symbols) and (H 2 ) 1 / 2 x = b (circle 
symbols) for odd n. 
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Figure 4: Residual error of the system (H 2 ) l l 2 x = b as a function of iV in the rational 
polynomial approximation. 
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Figure 5: Absolute value of D elements in the first column with space, spin and color 
indices fixed at one, as a function of time indices at f3 — 6.0 (circles) and (3 = 5.7 (stars). 
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p = 0.9a 


p = 1.0a 


p = 1.1a 






0.277E -ll+i0.660E -13 


0.124E -12 -i0.198E -14 


0.572E -11 -i0.694E -11 


0.303E+00 -i0.717E+00 


0.955E -ll+i0.167E -13 


0.575E -ll+i0.697E -11 


0.303E+00+i0.717E+00 


0.947E+00 -i0.998E+00 


0.940E+00 -i0.998E+00 


0.303E+00 -i0.717E+00 


0.947E+00+i0.998E+00 


0.940E+00+i0.998E+00 


0.303E+00+i0.717E+00 


0.957E+00 -i0.997E+00 


0.950E+00 -i0.995E+00 


0.931E+00 -i0.997E+00 


0.957E+00+i0.997E+00 


0.950E+00+i0.995E+00 


0.931E+00+i0.997E+00 


0.102E+01 -i0.999E+00 


0.102E+01 -i0.999E+00 


0.939E+00 -i0.997E+00 


0.102E+01+i0.999E+00 


0.102E+01+i0.999E+00 


0.939E+00+i0.997E+00 


0.104E+01 -i0.998E+00 


0.104E+01+i0.997E+00 


0.103E+01 -i0.998E+00 


0.104E+01+i0.998E+00 


0.104E+01 -i0.997E+00 


0.103E+01+i0.998E+00 



Table 1: Smallest eigenvalues of D for three instanton sizes p 
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